Text-to-speech synthesis with arbitrary speaker's voice from average voice
نویسندگان
چکیده
This paper describes a technique for synthesizing speech with any desired voice. The technique is based on an HMM-based text-to-speech (TTS) system and MLLR adaptation algorithm. To generate speech of an arbitrarily given target speaker, speaker-independent speech units, i.e., average voice models, is adapted to the target speaker using MLLR framework. In addition to spectrum and pitch adaptation, we derive an algorithm for adaptation of state duration. We demonstrate that a few sentences uttered by a target speaker are sufficient to adapt not only voice characteristics but also prosodic features. Synthetic speech generated from adapted models using only four sentences is very close to that from speaker dependent models trained using a large amount of speech data.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملSpectral voice conversion for text-to-speech synthesis
A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residualexcited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted ...
متن کاملAverage-Voice-Based Speech Synthesis
This thesis describes a novel speech synthesis framework " Average-Voice-based Speech Synthesis. " By using the speech synthesis framework, synthetic speech of arbitrary target speakers can be obtained robustly and steadily even if speech samples available for the target speaker are very small. This speech synthesis framework consists of speaker normalization algorithm for the parameter cluster...
متن کاملGMM Classification of TTS Synthesis: Identification of Original Speaker's Voice
This paper describes two experiments. The first one deals with evaluation of synthetic speech quality by reverse identification of original speakers whose voices had been used for several Czech text-to-speech (TTS) systems. The second experiment was aimed at evaluation of the influence of voice transformation on the original speaker recognition. The paper further describes an analysis of the in...
متن کاملMultimodal Speech Synthesis
The main goal of the speech synthesis group in SmartKom was to develop a natural sounding synthetic voice for the avatar " Smartakus " that is judged to be agreeable, intelligible, and friendly by the users of the SmartKom system. Two aspects of the SmartKom scenario facilitate the achievement of this goal. First, since speech output is mainly intended for the interaction of Smartakus with the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001